Towards Efficient MapReduce Using MPI
نویسندگان
چکیده
MapReduce is an emerging programming paradigm for dataparallel applications. We discuss common strategies to implement a MapReduce runtime and propose an optimized implementation on top of MPI. Our implementation combines redistribution and reduce and moves them into the network. This approach especially benefits applications with a limited number of output keys in the map phase. We also show how anticipated MPI-2.2 and MPI-3 features, such as MPI Reduce local and nonblocking collective operations, can be used to implement and optimize MapReduce with a performance improvement of up to 25% on 127 cluster nodes. Finally, we discuss additional features that would enable MPI to more efficiently support all MapReduce applications.
منابع مشابه
A MapReduce and MPI Programming Model for Distributed Large Scale 3D Mesh Processing
Developing a high performance platform for large-scale, high-intensity data processing is a priority for researching cost-effective parallel finite element methods (FEM). This paper introduces an efficient MapReduce-MPI based strategy for parallel 3D finite element mesh processing, demonstrates the potential benefits of this approach for optimally utilizing system resources. Preliminary experim...
متن کاملMPI for Big Data: New tricks for an old dog
The processing of massive amounts of data on clusters with finite amount of memory has become an important problem facing the parallel/distributed computing community. While MapReduce-style technologies provide an effective means for addressing various problems that fit within the MapReduce paradigm, there are many classes of problems for which this paradigm is ill-suited. In this paper we pres...
متن کاملGenetic Algorithms with Mapreduce Runtimes
Data-intensive Computing has played a key role in processing vast volumes of data exploiting massive parallelism. Parallel computing frameworks have proven that terabytes of data can be routinely processed. Mapreduce is a parallel programming model and associated implementation founded by Google, which is one of the leading companies in IT. Genetic Algorithms have increasingly applied on parall...
متن کاملPattern matching of signature-based IDS using Myers algorithm under MapReduce framework
The rapid increase in wired Internet speed and the constant growth in the number of attacks make network protection a challenge. Intrusion detection systems (IDSs) play a crucial role in discovering suspicious activities and also in preventing their harmful impact. Existing signature-based IDSs have significant overheads in terms of execution time and memory usage mainly due to the pattern matc...
متن کاملA Tree Algorithm Based on Parallel Cloud Computing Model
Cloud computing is the development of parallel computing, distributed computing and grid computing, and with the advancement of cloud computing, how to design efficient distributed tree algorithm is receiving more and more attention, Constrained by parallel assumption, Parallel tree algorithm are not easy to express in MapReduce. Inspired by Bulk Synchronous Parallel model, we propose an enhanc...
متن کامل